Digihumanitaaria ja infoühiskonna keskus kutsub kõiki magistrante, doktorante ja TÜ töötajaid töötubadesse "Methods of extracting keywords and topics from text collections", mida viib läbi Maciej Eder (Institute of Polish Language, Polish Academy of Sciences). Töötoad toimuvad Jakobi 2–105 arvutiklassis aegadel
Kõigis töötubades osalemise eest on võimalik saada 1 EAP ja osalemiseks täitke palun registreerimisvorm.
---
"Methods of extracting keywords and topics from text collections"
The workshop will offer an introduction to information extraction methods from collections of written texts. It will start with a keywords analysis methodology and different methods of extracting keywords, which will be followed by a discussion on topic modeling, or a technique that allows for extracting cohorts of semantically related words from text collections. Keywords analysis in its different flavors (LL keywords, Zeta, tf-idf) allows for identifying the most relevant words in a collection of texts, or between two subcorpora, or even between two texts, by comparing particular word frequencies and determining their statistical significance. Topis modeling, on the other hand, provides a more comprehensive search for words that exhibit some semantic similarity – as defined by their textual contexts – that allows for discovering latent thematic structure of the documents in question.
The workshop will be devided into a theoretical introduction, followed by a hands-on session. The tools used include AntConc (a freeware standalone tool that implemented keywords extraction), and the R programming invironment, to perform topic modeling.
Du, K. & Dudar, J. & Schöch, C., (2022) Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words Journal of Computational Literary Studies 1(1). doi: https://doi.org/10.48694/jcls.102
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., Hoiberg, D., et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176–82, https://www.science.org/doi/10.1126/science.1199644
Jockers, M. L. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
Paquot, M. and Bestgen, Y. (2009). Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Jucker, A. H., Schreier, D. and Hundt, M. (eds). Amsterdam / New York: Rodopi, pp. 247–69.
Goldstone, A. and Underwood, T. (2012). What can topic models of PMLA teach us about the history of literary scholarship?. Journal of Digital Humanities, 2(1) https://journalofdigitalhumanities.org/2-1/what-can-topic-models-of-pmla-teach-us-by-ted-underwood-and-andrew-goldstone/.